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Abstract 

We present a detailed comparison between onetep, our linear-scaling density functional method, 
and the conventional pseudopotential plane wave approach in order to demonstrate its high ac- 
curacy. Further comparison with all-electron calculations shows that only the largest available 
Gaussian basis sets can match the accuracy of routine onetep calculations. Results indicate 
that our minimisation procedure is not ill-conditioned and that convergence to self-consistency is 
achieved efficiently. Finally we present calculations with onetep, on systems of about 1000 atoms, 
of electronic, structural and chemical properties of a wide variety of materials such as metallic and 
semiconducting carbon nanotubes, crystalline silicon and a protein complex. 
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I. INTRODUCTION 



The formalism of Kohn-Sham density functional theory (DFT^^ for electronic structure 
calculations has become established as an approach that provides a good description of 
electronic correlation while keeping the size of calculations tractable. Nevertheless, the 
computational time taken by a conventional DFT calculation increases with the cube of 
the number of atoms. This scaling limits the size of problems that can be tackled to a few 
hundred atoms at most. As a consequence, many exciting problems which lie at the interface 
between the microscopic and mesoscopic worlds, particularly in the fields of biophysics and 
nanoscience, are out of the reach of DFT calculations. Progress towards the goal of bringing 
the predictive power of DFT to bear on these problems can be made only by developing 
approaches for DFT calculations that have linear-scaling or 0{N) instead of cubic-scaling 
computational cost. 

Even though there have been numerous theoretical developments, so far linear-scaling 
methods have not lived up to their early promise. Linear-scaling approaches are still de- 
scribed as "experimental"-^ and so far there are few examples of successful application to 
problems of interest in materials or biological sciences^. For a review see Refs. 00. Our 
ONETEP linear-scaling method for DFT calculations allows for the systematic control of 
both truncation errors and variational freedom in the basis set. For full details see Ref. Q 
and references therein. Here we demonstrate that onetep can be used to solve real prob- 
lems with the same level of confidence and general applicability as conventional cubic-scaling 
DFT approaches. 

In section IH] we begin with a brief presentation of the formalism for linear-scaling DFT 
on which ONETEP is based. In section IIIII we compare onetep with conventional well- 
established cubic-scaling methods with emphasis on the case of systematic improvement in 
the basis set, and hence in accuracy, and in the speed of self-consistent convergence. In 
section HVl we show how ONETEP can be used to explore a range of materials with thousands 
of atoms ranging from nanostructures to bulk solids to biomolecules. Finally, in section |^ 
we present our conclusions. 
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II. OVERVIEW OF THEORY 



Kohn-Sham DFT enables the problem of many interacting electrons in a static external 
potential to be mapped onto a fictitious system of non-interacting particles. Self-consistent 
solution of the resulting set of single-particle Schrodinger equations gives the ground-state 
energy and density of the original interacting problem. All the information about the ground 
state of the system is contained in the single-particle density matrix p(r, r') which, provided 
there is a band gap in the material, decays exponentially^'^'iSiyj^ as a function of the distance 
between r' and r. This property can be exploited to truncate the density matrix so that 
the amount of information it contains increases only linearly with the number of atoms. To 
perform this truncation in a practical way, the density matrix is expressed as 



where the are a set of spatially localised, nonorthogonal generalised Wannier functions 
(NGWFs)^ and the matrix K is called the density kerne^^. K can be made sparse by 
enforcing the condition K""^ = when |Rq, — Hfs\ > rent, where Rq, and R/3 are the centres 
of the localisation regions of NGWFs 0a (r) and 0/3 (r), respectively. 

ONETEP belongs to the category of methods that aim for high accuracy by optimis- 
ing self-consistently the energy not only with respect to K but also with respect to the 
NGWFsi^ii^'ii'iS'iSiSa. In onetep the NGWFs are expanded in a basis set of periodic cardinal 
sine (psinc) functionsti^*^, also known as Dirichlet or Fourier Lagrange-mesh function a^^i^? . 
Each psinc function is centred on a particular point of a regular real-space grid. Figure Q 
shows how this property is used to impose localisation on the NGWFs within predefined 
spherical regions. 

III. BASIS SET CONVERGENCE 

Since the computational cost of a DFT calculation increases with the size of the basis set 
it is important to be able to converge calculated properties to the desired accuracy using the 
smallest possible basis set. The most convenient way to achieve this is to improve the basis 
set systematically. For instance, the quality of a plane wave basis^^- is increased via a single 
parameter, the kinetic energy cut-off. At the other end of the spectrum are atomic orbital 
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FIG. 1: Imposing localisation on a NGWF {(pa)- The NGWF is expanded only in the psinc 
functions whose centres fall inside its localisation sphere. 

(AO) basis sets which do not span space in a uniform manner and whose systematic refine- 
ment is not straightforward. An AO basis is defined by a number of independent features, 
such as the number of functions per atom, and their radial and angular shapes. Furthermore, 
unlike plane waves, AO basis sets are not orthogonal and consequently the undesired effect of 
linear dependence can often hinder efforts to improve their quality. Nevertheless, numerous 
careful attempts have been made to construct series of atomic basis sets which demonstrate 
systematic improvement to varying degree32^>2L2LS&. Particular attention has been paid to 
Gaussian^S functions where the series of even-tempered^S and correlation-consistenlj^i basis 
sets are amongst the most well-known cases of AO bases with systematic behaviour. In 
ONETEP our psinc basis is constructed from plane waves in such a way that it fully retains 
their desirable properties of orthogonality and systematicity whilst being localised. 

It is important to note that the set of plane waves which constitute the psinc functions 
is different from that in a typical plane wave calculation. The relation between the two is 
clarified in Figure El The psinc basis set is constructed from plane waves e'^''" with wave- 
vectors G belonging to a cube of side-length 2Gupper in reciprocal space. On the other hand, 
conventional plane wave approachea^i such as the castep cod^ construct their basis from 
a sphere of G vectors. Therefore, to compare onetep calculations with a code such as 
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FIG. 2: The psinc basis of onetep is constructed from a cube of wavevectors. Conventional plane 
wave approaches such as castep define their plane wave basis from a sphere of wave vectors. Three 
choices of such spheres that could be used to compare onetep and castep calculations are shown. 



CASTEP, we need to decide first on the most appropriate sphere of wavevectors. Figure 
12] shows some choices for the radius of this sphere: Gupper (sphere inscribed in cube, the 
CASTEP basis is a subset of the onetep basis), Geqv (sphere has equal volume with cube, 
ONETEP and castep basis sets have an equal number of functions with most of them in 
common) and Giower (sphere circumscribes the cube, the castep basis is a superset of the 
ONETEP basis). 

In order to examine the strengths and weaknesses of onetep compared to conventional 
plane wave and AO approaches we have carried out a series of tests on the hydrogen bond 
in the formaldehyde-water complex shown in Figure El This is a rather sensitive test as 
hydrogen bonds are amongst the weakest and longest chemical bonds, yet they are very 
important because they are commonly encountered as major contributors to the structural 
stability and function of most biological macromolecules such as proteins, DNA and sugars^. 
For the purpose of comparison we have used the local density approximation (LDA)^^^ 
exchange correlation (XC) functional. 
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FIG. 3: The molecular structure of the formaldehyde- water hydrogen bonded complex used in our 
tests (not equilibrium geometry). 

Table H] shows the binding energies we obtained from calculations with castep for 
the three kinetic energy cut-offs in Figure El Also shown is the total energy of the 
bound complex. The core electrons in these calculations were replaced by norm- conserving 
pseudopotentials^^i^^^. Periodic boundary conditions were used and the molecule was 
placed in a very large cubic simulation cell (30 A x 30 A x 30 A) to ensure that the 
supercell approximation^^ holds extremely well. 

The corresponding ONETEP results are shown in Table ITTl for the same periodic simulation 
cell, pseudopotentials and LDA XC functional. A total of 16 NGWFs were used for the 
hydrogen bonded complex, one on each H atom and four on each C and O atom. We 
have performed calculations for a wide range of NGWF localisation sphere radii Hoc and we 
observe that the binding energy agrees with the converged castep value^ to 1 meV, for rioc 
as small as 3.7 A. The total energy converges rapidly from above as a function of rioc, as 
expected for a basis set variational method^S. We note that, once we are converged with 
respect to rioc, the onetep result lies between the castep bounds shown in Table IJ and, 
as one would expect, agrees closely with the 935 eV cut-off result (Ggqv sphere in Figure |2I). 
From here on we define the psinc kinetic energy cut-off to be the kinetic energy cut-off of 
the plane wave sphere with the same volume as the cube of our psinc basis. 

Table ITTl also shows the number of self-consistency iterations taken to converge the total 
energy, and we make the observation that this is independent of the localisation region 
radius rioc, which demonstrates that our method does not suffer from the "superposition 
ill-conditioning" described in Ref. 4^ 

To complete our comparison we present in Table IIIII calculations with the AO approach 



^ One kcal/mol is equal to 43.36 meV. 
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TABLE I: Calculations on the formaldehyde- water complex with castep**. 
Kinetic energy Total energy Binding energy 

cut-off (eV) (eV/atom) (meV) 
608 (oc G^pp^J -154.444 145 
935 (oc G^q^) -155.044 149 
1823 (oc Gl^J -155.082 148 



TABLE II: Calculations on the formaldehyde-water complex with ONETEPi. 



Hoc (^) Number of Total energy Binding energy 





iterations 


(eV/atom) 


(meV) 


2.6 


13 


-154.789 


168 


3.2 


13 


-154.890 


155 


3.7 


11 


-154.914 


150 


4.2 


11 


-154.921 


148 


4.8 


12 


-154.924 


148 



TABLE III: Calculations on the formaldehyde-water complex with NWChemsi using Gaussian 
basis sets of increasing size. 

Basis name Number Binding energy Counterpoise-corrected 







of AOs 


(meV) 


binding energ; 


ST0-3Gi^ 




19 


91 


39 


3-2 IGi^ 




35 


186 


92 


6-31G^ 




35 


171 


128 


6-31+G*i^i^ 




65 


159 


143 


6-31++G**i^i^ 




81 


162 


147 


cc-pVDZ & diffus( 


47.48 


111 


153 


146 


cc-pVTZil 




165 


157 


133 


cc-pVTZ & diffuse 


j47.48 


265 


149 


147 


cc-pVQZil 




350 


151 


140 


cc-pVQZ & diffuse 


47.48 


535 


148 


147 
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as implemented in the NWChen>ai quantum chemistry program which uses Gaussian basis 
sets and a closely related formula^^*^ for the LDA XC functional. In this approach the 
core electrons are treated explicitly and the molecules are virtually isolated in space as the 
calculations are done with open boundary conditions. The total number of AOs (contracted 
Gaussian functions) for the whole formaldehyde-water complex for each basis set is also 
shown in Table ITTTl 

From Table IIIII we observe that the convergence of the total energy of the complex is 
neither uniform nor rapid, as a consequence of the fact that the different features, e.g., 
diffuse functions etc., introduced to the basis set affect the energy to different extents. We 
also note that the size of the basis set required to reach the same level of accuracy as onetep 
is very large. The calculations with the Gaussian basis set suffer from basis set superposition 
error (BSSE) and thus in Table ITTTl we also give a column with binding energies calculated 
with the counterpoise correction method of Boys and Bernard*^. This costly correction 
procedure significantly improves the binding energies obtained with the medium sized basis 
sets (6-31+G* and 6-31++G**). 

IV. NANOSTRUCTURES, CRYSTALS AND BIOMOLECULES 

In this section we present several examples of calculations on systems with around 1000 
atoms. Materials and molecules with this number of atoms are usually beyond the capabil- 
ities of conventional cubic-scaling approaches. 

A. Nanostructures: Carbon nanotubes 

Carbon nanotubes are at the centre of many nanotechnology applications because of their 
unique electronic and mechanical properties'^ . From a structural point of view nanotubes are 
seamless cylinders of graphene, which can be either semiconducting or metallic. A method 
such as ONETEP where linear-scaling is achieved by taking advantage of the exponential 
decay of the density matrix present in insulators is not expected to be efficient on metallic 
systems where the decay is only algebraio^i. Metallic systems therefore present a significant 
challenge and carbon nanotubes are an ideal test case that can provide us with insight into 
how switching from a non-metallic to a metallic system (while keeping all other factors 
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essentially unchanged) affects calculations where density matrix truncation is applied. We 
have studied segments of metallic (10, 10) armchair and semiconducting (20, 0) zig-zag 
carbon nanotubesj^i*^ 

The (10, 10) nanotube segments are constructed by repeating identical units of 40 atoms 
while the (20, 0) segments are made of units of 80 atoms. For the (10, 10) nanotube we 
performed ONETEP calculations on segments consisting of 8, 15, 16, 30 and 32 units ranging 
from 320 to 1280 atoms. For the (20, 0) nanotube we used segments of 8 and 16 units 
with 640 and 1280 atoms respectively. As our nanotube segments were made of repeated 
identical units we were able to perform with castep calculations equivalent to onetep by 
using only a single unit but equivalent Brillouin zone sampling. The same LDA^^*^ XC 
functional and pseudopotential were used by both codes. The plane wave kinetic energy 
cut-off of CASTEP was set to 410 eV as was the psinc kinetic energy cut-off of onetep. In 
the ONETEP calculations the radii rioc of the carbon NGWF localisation spheres were 3.3 A. 
The nanotube segments were placed in orthorhombic simulation cells with their axis aligned 
with the z-axis. The dimensions of the cells along the x- and y-axes were 30 A x 30 A. These 
simulation cells ensured negligible interaction of the nanotubes with their periodic images 
as the diameter of the (10, 10) tubes is just 13.6 A and that of the (20, 0) tubes is 15.6 A. In 
order to perform a detailed comparison of the results between the two codes we diagonalised 
the converged onetep Hamiltonian in the NGWF representation and obtained canonical 
molecular orbitals. From these we constructed the density of states (DOS) by smearing 
with Gaussians with a halfwidth of 0.1 eV. Our results are shown in Figure lU The two 
codes give virtually identical DOS in the important regions of 1 eV below and above the 
Fermi level and very close agreement in the region below -1 eV. In the region above 1 eV 
the agreement deteriorates rapidly. This is not surprising as the NGWFs of onetep are 
specifically optimised to describe the density matrix which is composed of occupied bands 
and no emphasis is placed on the description of the conduction bands. It is still remarkable 
that the low-lying conduction band DOS is calculated correctly with onetep. 

As we make our (10, 10) nanotube segments longer, we increase the number of closely- 
spaced k-points from the metallic band structure of the nanotube that we fold into our 
equivalent F point description of the band structure and the density matrix. We have found 
that as the number of segments increases, it becomes more and more difficult to impose 
a finite density kernel cut-off threshold r^ut in onetep while maintaining any degree of 
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FIG. 4: Top panel: The density of states (DOS) of a 30-unit segment of a (10, 10) metallic 
nanotube as calculated with onetep and castep. On the right the onetep (30Ax30Ax73.30A) 
and CASTEP (30Ax30Ax2.44A) simulation cells are shown. Bottom panel: The density of states 
(DOS) of a 16-unit segment of a (20, 0) semiconducting nanotube as calculated with onetep and 
CASTEP. On the right the onetep (30Ax30Ax67.78A) and castep (30Ax30Ax4.24A) simulation 
cells are shown. 
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accuracy. With the 30 and 32 unit segments an infinite rcut becomes essential in order 
to obtain useful results. In contrast, the (20, 0) nanotube remains amenable to density 
kernel truncation as the length of its segments is increased. For example in Figure E] we 
show the DOS for the 16 unit segment generated with rcut = oo and with rcut = 15.9 A 
and the two curves essentially coincide. Our observations are thus consistent with expected 
behaviour regarding the decay of the density matrix in metallic and non-metallic systems 
at zero temperature. 

ONETEP calculations with rcut = oo, while not linear-scaling, are still perfectly feasible. In 
particular, most computationally intensive steps such as the construction of the Hamiltonian 
matrix in the NGWF representation, the construction of the electronic charge density and 
the calculation of the derivatives of the NGWFs with respect to the expansion coefficients 
in the psinc basis depend only in the NGWF localisation sphere radii rioc and are always 
perfectly linear-scaling independently of the value of rcut- The only step that stops being 
linear-scaling when the density kernel K is no longer sparse is the optimisation of K which 
is carried out by using variants of the Li-Nunes-Vanderbil1)^ method and Haynea^ penalty 
functional method which involve matrix multiplications. 

It is also worth noting that unlike conventional plane wave approaches where the memory 
and computation grows with the entire volume of the simulation cell without distinction 
between vacuum and atomic regions, onetep uses algoritmns^^*^ which avoid computation 
and storage in vacuum regions making thus possible calculations in very large simulation 
cells as in this section. 

B. Solids: crystalline silicon 

Here we examine properties of pure crystalline silicon as calculated by onetep and 
CASTEP. For these calculations we have used the LDA with a norm- conserving pseudopo- 
tential and plane wave and psinc kinetic energy cut-offs of 283 eV. A cubic unit cell of 1000 
atoms was used in the onetep calculations and a cubic unit cell of 8 atoms was used in the 
CASTEP calculations, with an equivalent 5x5x5 k-point mesh. The two cells are shown in 
Figure El We should note that in a code like castep there are two ways to define the basis 
set while varying the energy with respect to the lattice parameter. One can either keep the 
kinetic energy cut-off Ecut constant or keep the number of plane wave basis functions A^pw 
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FIG. 5: Periodic crystalline silicon. Left: The 8-atom cubic simulation cell used in the calculations 
with CASTEP. Right: The 1000-atom cubic simulation cell used in the calculations with onetep. 

TABLE IV: Lattice constant and bulk modulus of perfect crystalline silicon as calculated by 
CASTEP, ONETEP and experiment. 



Method Kinetic energy Lattice constant Bulk modulus 





cut-off (eV) 


(A) 


(GPa) 




184 


5.410 


94.7 


CASTEP, constant Ecut 


283 


5.392 


94.4 




551 


5.383 


95.9 




184 


5.359 


109.1 


CASTEP, constant A'pw 


283 


5.380 


96.1 




551 


5.382 


96.6 


ONETEP, constant A'psinc, ''cut = oo 


283 


5.406 


99.6 


ONETEP, constant iVpsinc, rcut = 9.5 A 


283 


5.421 


99.5 


Experiment 




5.430 


100.0 



constant. The latter approach is conceptually closer to the way the ONETEP calculations 
are performed in these cases as it is the number of psinc functions that is kept constant, 
which is equivalent to keeping constant the number of plane waves in the cube of Figure |21 
Furthermore, in the ONETEP calculations, when varying the lattice parameter, it is impor- 
tant to scale rioc and rcut proportionately. Throughout this section the values we report for 
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FIG. 6: The total energy per 8-atom unit cell of silicon as a function of the lattice parameter for 
calculations with castep and onetep. 

these quantities correspond to a lattice parameter of 5.43 A. 

In Figure ini we show castep constant- A^pw plots of the total energy per 8-atom cell as a 
function of the lattice parameter for kinetic energy cut-offs which correspond to the "upper 
bound" (184 eV), "equivalent" (283 eV) and "lower bound" (551 eV) cases of FigureH The 
two ONETEP curves lie higher in energy than the CASTEP curves because the NGWF radii 
we used were only 3.2 A and the total energy is not yet completely converged with respect 
to them. Nevertheless, the physical properties that we calculate are already converged to a 
very satisfactory level. 

By fitting the calculated energies as a function of the lattice parameter to the Birch- 
Murnaghan equation of state^^ we obtained values for the lattice constants and bulk moduli 
of crystalline silicon which we show in Table IIVI There is excellent agreement between the 
ONETEP and CASTEP constant-A^pw results at 283 eV. For the case of the infinite rcut the 
lattice constants agree to 0.5% and the bulk moduli to 3.6% while for the case of the 9.5 A 
Tcut the lattice constants agree to 0.8% and bulk moduli to 3.5%. 

The bulk modulus is a quantity which is sensitive to calculation parameters and difficult 
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to converge. Even between the castep calculations with the highest cut-off of 551 eV there 
remains a difference of 0.7% between the bulk modulus values obtained with constant Ecut 
and constant A^pw while the lattice constant difference in this case is reduced to only 0.02%. 

C. Biomolecules: breast cancer susceptibility proteins 

Biomolecules are generaly too large for conventional DFT calculations. Nevertheless a 
number of insightful studies have been carried out where a small fragment can be isolated 
from the rest of the biomolecule^i^. Obviously this approach cannot be applied in cases 
where the interactions extend over a large area, e.g., the case of two large proteins bound 
to each other. ONETEP can offer great advantages in the study of such molecules since it 
allows one to perform calculations either on entire biomolecules or at least on segments 
large enough to contain the entire area of interaction. An example of the latter case is the 
RAD51-BRCA2 protein complex for which we present preliminary results in this section. 

The breast cancer susceptibility protein^ BRCA2 regulates the function of RAD51, an 
enzyme involved in DNA recombination. Crucial to this process is the specific interaction 
between RAD51 and a BRC motif (sub-region) in BRCA2. There are eight slightly different 
versions of the BRC motif in a single BRCA2 protein and each of these motifs can interact 
with a RAD51 protein. Recently the structure of RAD51 bound to one of the BRC motifs 
(BRC4) has been elucidated by high resolution X-ray diffraction, revealing in a qualitative 
manner the nature of the interactions at the site of contact between the two proteins^. 
Amino acids with both polar and hydrophobic side chains are involved in these interactions. 
With this crystal structure as our starting point, we have used calculations with onetep 
to predict the strength of the binding between the two proteins. The 988-atom protein 
segment we have studied here (Figure IH} consists of the entire BRC4 motif and only the 
A5 a-helix of the RAD51. According to Pellegrini et al.^ the major bonding interaction 
between A5 and BRC4 is a polar interaction involving hydrogen bonding between the side 
chains of arginine 250 of A5 and glutamic acid 1548 of BRC4. We have first studied just this 
interaction in isolation by cutting a very small 97-atom segment from the crystal structure 
of the proteins (Figure IZj) which contains the relevant amino acids Arg 205 and Glu 1548. 
The two hydrogen bonds between the cationic side chain of the arginine and the anionic 
chain of the glutamic acid of the segment are depicted in Figure [3 For this segment we 
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FIG. 7: 97-atom segment which includes the bonding interactions between the Arg 250 - Glu 
1548 residues of the BRCA2-RAD51 complex. Left: stick model of the atomic structure. Middle: 
isosurface of the electronic change density at a value of 0.02 e~ /a^ from the onetep calculation. 
Right: isosurface of the electronic charge density difference due to bonding at a value of 0.00075 
e~ /qq from the onetep calculation. 




FIG. 8: The 988-atom A5-BRC4 complex. Left: tertiary structure. Middle: stick model in atomic 
detail with the side groups of Arg 250 and Glu 1548 shown in space-filling form. Right: isosurface 
of the electronic charge density at a value of 0.02 e~/ag from the onetep calculation. 

were able to perform calculations with both onetep and castep. We have used norm- 
conserving pseudopotentials, plane wave and psinc kinetic energy cutoffs of 608 eV, cubic 
simulation cells of 25 A x 25 A x 25 A and the Perdew-Burke-Ernzerhof (PBE)^ exchange- 
correlation functional. The radii rioc of the NGWF localisation spheres were set to 3.2 A 
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for the hydrogen atoms and 3.6 A for all other atoms. The binding energies between these 
two fragments as calculated by castep and onetep are 2.78 eV and 2.79 eV respectively. 
Besides the excellent agreement between the two codes it is worth observing that this binding 
energy is large in comparison to the energy of two regular hydrogen bonds (which individually 
range from about 100 to 300 meV). It appears that the bulk of the binding strength comes 
from the electrostatic interaction between the +1 charge of the arginine side group and the 
-1 charge of the carboxyl of the glutamic acid. Indeed, the classical electrostatic energy of 
a system of two point charges of +1 and -1 atomic units separated by the same distance as 
the centres of the side chains of these amino acids, is about 3.10 eV which is rather close 
to the calculated binding energy. Calculations with the NWChem code with the 6-3 1+G* 
Gaussian basis set and the PBE functional produced a binding energy of 2.87 eV which after 
the counterpoise correction for ESSE became 2.82 eV, in very good agreement with castep 
and ONETEP given the level of accuracy that can be reached with a Gaussian basis set of 
this quality f section ITTT]) . 

For the BRC4-A5 complex of FigurelHlthe same calculation parameters as for the 97-atom 
segment were used except for the orthorhombic 60 A x 50 A x 60 A simulation cell. The 
binding energy between the whole A5 helix and the BRC4 motif that we obtained from our 
calculations with onetep is 5.67 eV. This is about twice as much as the binding energy of 
the small segment of Figure [7| and it shows that the remaining interactions between A5 and 
BRC4, though small individually, cannot be neglected. 

V. CONCLUSIONS 

In comparison with two well-established cubic-scaling density functional methods, we 
have demonstrated that onetep can routinely achieve the highest levels of accuracy that 
are possible with these methods. Amongst the factors that make this possible is the fact 
that in onetep the calculated properties converge rapidly with the radii of the localisation 
spheres of nonorthogonal generalised Wannier functions (NGWFs) and the rate of self- 
consistent convergence is affected neither by the size of these regions nor the number of 
atoms. Next we have demonstrated the wide applicability of the method by presenting 
exploratory calculations in systems of about 1000 atoms from a wide variety of materials. 
We have studied semiconducting and metallic nanotubes, crystalline silicon, and the complex 
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of two bound proteins that can play a role in the development of breast cancer. In all these 
cases we have managed to obtain excellent agreement with CASTEP in comparing either on 
smaller systems of the same material or, where possible by the use of k-points, in systems of 
equivalent size. These results confirm that ONETEP is a robust, highly-accurate linear-scaling 
density functional approach which makes possible a whole new level of large scale simulation 
in systems of interest to nanotechnology, biophysics and condensed matter physics. 
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